Pii: S0097-8485(01)00074-2
نویسنده
چکیده
The Z-value is an attempt to estimate the statistical significance of a Smith and Waterman dynamic programming alignment score (H-score) through the use of a Monte-Carlo procedure. In this paper, we give an approximation for the Z-value law deduced from the Poisson clumping heuristic developed by Waterman and Vingron (Stat. Sci. 9 (1994) 367) in the case of independent and identically distributed sequences comparison. As for non-gapped alignment scores, our approximation is of Gumbel type but with parameters that are sequence independent. This result makes clear the related experimental results mentioned by Comet et al. (Comput. Chem. 23 (1999) 317). Using ‘quasi-real’ sequences (i.e. randomly shuffled sequences of the same length and amino acid composition as the real ones) we investigate the relevance of our approximation result. Since the Monte-Carlo approach we use generates a bias for the Gumbel decay parameter estimation, a correction procedure is proposed. Applications to real sequences are considered and we show how our results can be used to detect the potential biological relationships between real sequences. © 2001 Elsevier Science Ltd. All rights reserved.
منابع مشابه
Erratum to "High Order Spatial Discretisations in Electrochemical Digital Simulation. 2. Combination with the Extrapolation Algorithm": [Computers & Chemistry 25(2001) 205-214]
Erratum to ‘‘High order spatial discretisations in electrochemical digital simulation. 2. Combination with the extrapolation algorithm’’ [Computers & Chemistry 25 (2001) 205–214] J. Strutwolf *, D. Britz b,1 a Department of Chemistry, Christopher-Ingold-Laboratories, Uni ersity College London, 20 Gordon Street, London WC1 0AJ, UK b Kemisk Institut, Aarhus Uni ersitet, 8000 A rhus C, Denmark www...
متن کاملAb-initio prediction and reliability of protein structural genomics by PROPAINOR algorithm
We have formulated the ab-initio prediction of the 3D-structure of proteins as a probabilistic programming problem where the inter-residue 3D-distances are treated as random variables. Lower and upper bounds for these random variables and the corresponding probabilities are estimated by nonparametric statistical methods and knowledge-based heuristics. In this paper we focus on the probabilistic...
متن کاملBinary Coding of Kekulé Structures of Catacondensed Benzenoid Hydrocarbons
An algorithm is described by means of which the Kekulé structures of a catacondensed benzenoid molecule (with h hexagons) are transformed into binary codes (of length h). By this, computer-aided manipulations with, and memory-storage of Kekulé structures are much facilitated. Any Kekulé structure can easily be recovered from its binary code.
متن کاملConsistent Integration of Non-reliable Heterogeneous Information Resources Applied to the Annotation of Transmembrane Proteins
Information agents integrate multiple distributed heterogeneous information sources. The challenging yet unsolved problem that remains, is to ensure the semantic consistency of the integrated data. In this paper we set out to develop a general approach to inconsistency management for information agents. It is implemented as part of the EDITtoTrEMBL system and applied on a large real-world probl...
متن کاملArtificial Neural Network Method for Predicting Protein Secondary Structure Content
In this paper, the neural network method was applied to predict the content of protein secondary structure elements that was based on 'pair-coupled amino acid composition', in which the sequence coupling effects are explicitly included through a series of conditional probability elements. The prediction was examined by a self-consistency test and an independent-dataset. Both indicated good resu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001